Chapter 5 Distributions

Therefore, we must next learn about the different types of data distributions we are likely to encounter in the wild.

5.1 Discrete distributions

A discrete random variable has a finite or countable number of possible values. As the name suggests, it models integer data. Below we provide options to generate and visualise data belonging to several classes of discrete distributions. Later (Chapter X) we will learn how to transform these data prior to performing the approapriate statistical analysis.

5.1.1 Bernoulli distribution

A Bernoulli random variable, \(x\), takes the value 1 with probability \(p\) and the value 0 with probability \(q=1−p\). It is used to represent data resulting from a single experiment with binary (yes or no; black or white; positive or negative; success or failure; dead or alive;) outcomes, such as a coin toss—there are only two options, heads or tails. Nothing else. Here, \(p\) represents the probability of the one outcome and \(q\) the probability of the other outcome. The distribution of the possible outcomes, \(x\), is given by:

\[ f(x;p)= \begin{cases} p, &\text{if}~x=1\\ 1-p, &\text{if}~x=0 \end{cases} \]

5.1.2 Binomial distribution

A binomial random variable, \(x\), is the sum of \(n\) independent Bernoulli random variables with parameter \(p\). This data distribution results from repeating identical experiments that produce a binary outcome with probability \(p\) a specified number of times, and choosing \(n\) samples at random. As such, it represents a collection of Bernoulli trials.

\[f(x;n,p)= {n\choose x}p^{x}(1-p)^{n-x}\]

5.1.3 Negative binomial distribution

A negative binomial random variable, \(x\), counts the number of successes in a sequence of independent Bernoulli trials with probability \(p\) before \(r\) failures occur. This distribution could for example be used to predict the number of heads that result from a series of coin tosses before three tails are observed:

\[f(x;n,r,p)= {x+r-1\choose x}p^{x}(1-p)^{r}\]

5.1.4 Geometric distribution

A geometric random variable, \(x\), represents the number of trials that are required to observe a single success. Each trial is independent and has success probability \(p\). As an example, the geometric distribution is useful to model the number of times a die must be tossed in order for a six to be observed. It is given by:

\[f(x;p)=(1-p)^{x}p\]

5.1.5 Poisson distribution

A Poisson random variable, \(x\), tallies the number of events occurring in a fixed interval of time or space, given that these events occur with an average rate \(\lambda\). Poisson distributions can be used to model events such as meteor showers and or number of people entering a shopping mall. This equation describes the Poison distribution:

\[f(x;\lambda)=\frac{\lambda^{x}e^{-\lambda}}{x!}\]

5.2 Continuous distributions

5.2.1 Uniform distribution

5.2.2 Normal distribution

Boxplot and probability density function of a normal distribution *N*(0, σ^2^). Credit: [Wikipedia](https://en.wikipedia.org/wiki/Probability_density_function)

Figure 5.1: Boxplot and probability density function of a normal distribution N(0, σ2). Credit: Wikipedia

5.2.3 Student T distribution

5.2.4 Chi-squared distribution

5.2.5 Exponential distribution

5.2.6 F distribution

5.2.7 Gamma distribution

5.2.8 Beta distribution

5.3 Exercises

5.3.1 Exercise 1

Choose two different datasets and plot them as historgrams with density curves overlayed. Label them with the distribution they appear to be and stitch them together with ggarrange().